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Abstract 


If a parametric model for the ability distribution is not assumed, then the customary two-parameter 
and three-parameter logistic models for item response analysis present identifiability problems not 
encountered with the Rasch model. These problems impose substantial restrictions on possible 
models for ability distributions. 
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In the two-parameter logistic (2PL) and three-parameter logistic (3PL) models commonly 
employed in item response analysis, parameter estimation is typically accomplished by use of 
marginal maximum likelihood based on an assumption of a normal ability distribution (Bock 
& Lieberman, 1970; Bock & Aitkin, 1981); however, attempts have been made to consider 
marginal estimation with less restricted ability distributions (Heinen, 1996, chap. 6). In Section 2, 
exploitation of results previously derived for the Rasch model (Cressie & Holland, 1983) permits a 
demonstration that marginal estimation with unrestricted ability distributions is problematic for 
the 2PL and 3PL models because the general 2PL and 3PL models place insufficient restrictions 
on the joint distribution of the item responses to permit identification of model parameters. In 
Section 4, the problems encountered with the general 2PL and 3PL models are shown not to exist 
in the Rasch model, and conditions are provided under which restricted versions of the 2PL and 
3PL models place adequate restrictions on the joint distribution of the item responses so that 
estimation of model parameters can be considered. 

Nonetheless, as noted in Section 4, estimation of parameters may remain impractical even in 
cases in which 2PL and 3PL models place adequate restrictions for parameter estimation to be 
possible in principle. This issue is examined in the context of latent class models. 

1. General Marginal Estimation 

To address the fundamental difficulty with marginal estimation with less restricted ability 
distributions requires some general results concerning maximum likelihood estimation for item 
responses. In a test with binary responses, random variables Yij, 1 < j < q, 1 < i < n, are 
observed, where n > 1 and q > 2 are integers, and Y tJ represents a response of examinee i on item 
j of a test. The possible values of Ijy are 1 (correct) and 0 (incorrect). Let Y*, 1 < i < n, denoted 
the (/-dimensional vector of responses Y^, 1 < j < q. If the examinees can be regarded as a simple 
random sample from an infinite population of possible examinees, then the Y, are independent 
and identically distributed. 

To characterize the distribution of Yj, some preliminary notation is helpful. Let J be the set 
of vectors of dimension q with coordinates 0 or 1, so that J has m = 2 q elements and Y* is in J. 
Let R J be the set of arrays r with real coordinates r y for y in J, and let S be the unit simplex in 
R J , so that S consists of r in R J such that r y > 0 for y in J and )C y eJ r y = L and let So be the 
set of r in S with all coordinates positive. For x real, let x be the member of R J with all elements 
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equal to x. 

The distribution of Y; is characterized by p in S, where, for y in J, p y is the probability that 
Y< = y. 

Inferences concerning p may be based on the array f of relative frequencies, where, for y in 
J, / y is the fraction of the examinees i with Y, = y. The array f in S is then a sufficient statistic 
for p. The log likelihood function £ then satisfies 

^(P) = fylogp y 

p e J 

for p in S, where the convention OlogO = 0 is used. For any nonempty subset T of S, p is a 
maximum likelihood estimate of p for the model M(T ) that p is in T if p is in T and £(p) is the 
supremum £{T) of £(p) for p in T. 

It is well-known that, for the unrestricted model M(S), the unique maximum likelihood 
estimate of p is f. If f is in T for a subset T of S, then f is also the unique maximum likelihood 
estimate of p for model M(T), for 

£(f) = £{S) > £(T) 

and 

£{i) < £{T). 

Thus the maximum £(T) of the log likelihood for model M[T ) is the same as the maximum £(S) 
of the log likelihood for model M(S). The log likelihood ratio test statistic 

L 2 = 2 n[£{S) - £(T)} = 0. 

In this fashion, no evidence exists to discriminate between models M(T ) and M(S) even if T is a 
proper subset of S. 

In large samples, it is a simple matter to find a condition under which, as the sample size n 
approaches oo, the probability approaches 1 that f is the unique maximum likelihood estimate. 
Let the interior ir(T) of T relative to S be the union of all sets O n S C T such that O is an open 
subset of R J . Let model M(T ) be said to be locally unrestricted if ir(T) is nonempty, and let the 
model be locally unrestricted at p in T if p is in ir(T). The model is said to be locally restricted 
if it is not locally unrestricted. If model M[T ) is locally restricted at p in ir(T), then the weak 
law of large numbers implies that, as the sample size n approaches oo, the probability approaches 
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1 that f is in ir(T) and p = f (Cramer, 1946, p. 254). Thus the probability approaches 1 that the 
test statistic L 2 = 0. In this fashion, it is clearly undesirable for a model to be locally unrestricted. 

General Models for Item Responses With One-Dimensional Ability 

Many apparently reasonable models for item responses are locally unrestricted. This problem 
is examined in this section in terms of one-parameter logistic (1PL), two-parameter logistic 
(2PL), and three-parameter logistic (3PL) models (Hambleton, Swaminathan, & Rogers, 1991, 
ch. 2). To develop a general framework for the discussion, consider the general one-dinrensional 
model for item responses in which an ability parameter has one dimension. To each examinee 
i, associate an unobserved random variable 9 t that represents the ability of that examinee. The 
local independence assumption is made that, for each examinee i, the responses Yij, 1 < j < q, 
are conditionally independent given 0i. It is also assumed that the pairs (Y;,$*), 1 < i < n, are 
independent and identically distributed. This latter assumption is consistent with the previous 
assumption that the Y,; are independent and identically distributed. 

The common distribution function of the ability parameter 9j is denoted by F. Associated 
with each item j is a nondecreasing item characteristic curve (ICC) Pj ,0 <Pj < 1, such that, for 
each real 9 and each item j. 1 < j < q, and each examinee i, 1 < i < n, Pj(9) is the probability 
that = 1 given that 9i = 9. Let 

Qj = 1 — Pj (1) 

be the ICC of 1 — Yij, and let the item logit function A j be 

A? = i°§ {Pj/Qj) ( 2 ) 

(Holland, 1990), so that 

p j = [1 + exp(— Xj )]- 1 (3) 

and 

Qj = [1 + exp(Aj)] -1 . (4) 

Let A be the (/-dimensional function with coordinates A j for 1 < j < q. For (/-dimensional vectors 
a and b with respective coordinates aj and bj for 1 < j < q, let 

q 

a'b = ^ ajbj. 
j =i 
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Under the item response model, p is in So, and 


Py 


ru 

3 = 1 


Vj q) Vj 


dF, y £J, 


(5) 


(Cressie & Holland, 1983), so that the item characteristic curves Pj, 1 < j < g, and the 
distribution F determine the common joint distribution of the Y, . Equivalently, if 

V = U 1 + ex P(Ay)](6) 
3=1 

then the following variant on the Dutch identity holds: 


Py = J V exp(y'A )dF. (7) 

(Holland, 1990). 

Let V be the set of q- dimensional functions P on the real line with coordinates Pj, 1 < j < q, 
that are strictly increasing real functions on the real line with values in (0,1). Let A be the set 
of (/-dimensional functions A with coordinates A j, 1 < j < q, that are strictly increasing real 
functions on the real line. Let T be the set of real functions that are distribution functions of real 
random variables. The assumption that the general one-dinrensional model holds states in effect 
that p is in the subset S m of all p in Sq such that (5) holds for some P in V and some F in T 
(Cramer, 1946, p. 57). The set S m is a proper subset of So (Holland & Rosenbaum, 1986). The 
set S m can also be defined to be the set of p in Sq such that ( 6 ) and (7) hold for some A in A and 
some F in T. 

In an item response model with a one-dimensional ability parameter, p is assumed to belong 
to a nonempty subset T of S m . In this section, some common item response models are examined 
to determine whether they are locally unrestricted. The results summarized are obtained from 
Theorem 5 and from dimensional analysis. 


Rasch Models 

In a general Rasch (1PL) model, A is in the set Ai of A in A such that 

A(0) = (o0)l- 7 (8) 

for some real a > 0 and some 7 in R q with coordinates 7 j, 1 < j < q. The common item 
discrimination is a and the item difficulty of item j is /3j = 7 j/a. No restriction is made on the 
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ability distribution. Thus p is in the set S± of p that satisfy ( 6 ) and (7) for F in T and for A in Ai. 
Provided that q > 2 , the model is locally restricted. Otherwise, the model is locally unrestricted. 

A somewhat less general version of the Rasch model is the normal Rasch model that requires 
that F is the cumulative normal distribution function $ (Bock & Lieberman, 1970; Bock & Aitkin, 
1981). In this model, p is in the set S\ n of p that satisfy ( 6 ) and (7) for F = $ and A in Ai. As 
in the case of the general Rasch model, the normal Rasch model is locally restricted for q > 2 and 
locally unrestricted for q = 2. 

In a latent-class Rasch model for a given vector r of distinct ability levels 77 , 1 < k < K, 

K >2 (Heinen, 1996), it is assumed that the Rasch model holds for F in the set F T of distribution 
functions of random variables that only have values 77 , 1 < k < K. The corresponding set S± T 
consists of p in S m that satisfy ( 6 ) and (7) for some F in F T and some A in Ai. For q > 2, the 
model is locally restricted. For q = 2, the model is locally unrestricted. 

2PL Models 

In a general 2PL model, A is in the set A 2 of A in A such that 

A(0) = 0a- 7 (9) 

for some a in R q with positive coordinates aj and some 7 in R q . Thus aj is the item discrimination 
and / 3j = 'Jj/aj is the item difficulty for item j. In the case of no restriction on the ability 
distribution, p is in the set S 2 of p that satisfy ( 6 ) and (7) for F in T and for some A in 
A 2 . Rather remarkably, the simple change from the constant item discrimination in the Rasch 
model to variable item discrimination in the 2PL model results in a model that is always locally 
unrestricted. 

The normal 2PL model requires that F is the cumulative normal distribution function <F. In 
this model, p is in the set Sm of p that satisfy ( 6 ) and (7) for F = $ and for A in A 2 . The normal 
2PL model is locally restricted for q > 2 and locally unrestricted for q = 2. 

In a latent-class 2PL model for a given vector r of distinct ability levels 17 , 1 < k < K , 

K > 2, it is assumed that the 2PL model holds for F in the set J- T . The corresponding set S^-r 
consists of p in S m that satisfy ( 6 ) and (7) for some F in J- T and for some A in A 2 . In this 
case, local restriction occurs if 2 q + K < 2 q , and the model is locally unrestricted if q < 3. As 
discussed in Section 4, the theoretical existence of local restriction does not necessarily ensure 
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that parameter estimation is really practical. 


3PL Models 

In a general 3PL model, it is assumed that p is in the set S 3 of p in S m such that (1) and (5) 
hold for some F in T and some P in the set V 3 of P in V such that, for real 9, 

Pj(0) = cj + (1 - cj)[ 1 + exp {-aj0 + 7 ,-)] -1 , 1 < 3 < <h (10) 

for some real aj > 0 , Cj in [ 0 , 1), and 7 j, 1 < j < q. Here Cj is the item guessing parameter, a,j is the 
item discrimination, and (3j = 7 j/aj is the item difficulty for item j. Clearly Si C £2 C 63 C S m , 
so that the general 3PL model is locally unrestricted. 

The normal 3PL model requires that F is the cumulative normal distribution function $. In 
this model, p is in the set S'. 3 n of p that satisfy (1) and (5) for F = $ and for P in V 3 . The normal 
3PL model is locally restricted for q > 3 and locally unrestricted for q < 3. 

In a latent-class 3PL model for a given vector r of distinct ability levels 17 , 1 < k < K, 

K > 2, it is assumed that the 3PL model holds for F in the set T t . The corresponding set S- 3 T 
consists of p in S m that satisfy (1) and (5) for some F in T t and for some P in V 3 . In this case, 

local restriction occurs if 3 q + K < 2 q , and the model is locally unrestricted if q < 3. Once again, 

the theoretical existence of local restriction does not necessarily ensure that parameter estimation 
is really practical. 


2. The General 2PL and 3PL Cases 

As already noted, the general 2PL and 3PL models are locally unrestricted. Proof relies on 
the following theorem. 

Theorem 1 Let A be in A, and let (1) hold. Let T be a subset of S m such that p is in T if (7) 
holds for some F in T. Let c be the function from the real line R to R J with coordinates 

c y = exp(y ; A), y € J. 

Let the c (9), 9 real, span R J . Then model M(T ) is locally unrestricted. 

Proof. Consider distinct 9k, 1 < k < m, such that c (6k), 1 < k < m, spans R J . Let A rn 
be the unit simplex in R m . so that an m-dimensional vector a is in A m if the coordinates a/ - are 
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nonnegative for 1 < k < m and if Y^k =1 a k = 1- Let a be in 7l m , and let T 1 be the distribution 
function of a random variable that assigns probability a k to 8 k for 1 < k < m. If 

m 

P = ^2 a kV{6 k )c(Tk), 
k =l 

then (7) holds. Thus T includes p. Because a is arbitrary, T includes the simplex D with vertices 
V(Tk)c(Tk), 1 < k < m. To demonstrate that ir(T) is nonempty, it suffices to show that ir(D) is 
nonempty. To do so, it suffices in turn to demonstrate that 


^2 a k V(T k )c(T k ) = 0 


k= l 


and 


22 ak = 0 


( 11 ) 


( 12 ) 


k =1 


only if a k = 0 for 1 < k < m (Rockafellar, 1970, pp. 6 , 13). By assumption, (11) implies that 
Q-kV{Tk) is 0 for each k, so that a k = 0 for 1 < k < m. 

The simplest application of the theorem is to the 2PL model. Consider the following result. 
Theorem 2 The general 2PL model is locally unrestricted. 

Proof. Consider a,-, 1 < j < q, such that 


5 V — 


22 a i y i 


3=1 

has m distinct values for y in J. Let Tfe, 1 < k < m, be distinct real numbers. Consider arbitrary 
real 7 j, < j < q, and let (9) and (3) hold. Let 

<? 

ty = E«' 

3 =1 


In Theorem 1, 


c y {r k ) = exp(rfcSy) exp(—t y ). 


The determinant of an m by m matrix with coordinates exp (bid k ), 1 < i < m, 1 < k < m, is 
positive definite if 6 * is strictly increasing in i and d k is strictly increasing in k (Karlin & Stridden, 
1966, pp. 9-10). By elementary linear algebra, it follows that the c(r k ), 1 < k < m, must be 
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linearly independent. Because R J has dimension m = 2 q , it follows that the c( 77 ), 1 < k < m, 
span R J , so that the conditions of Theorem 1 are satisfied. 

Because S 2 C S 3 C S m , it follows that ir(S 2 ) C ir(Ss) C ir(S m ). Thus Theorem 2 implies 
that the general 3PL model M(Ss) is locally unrestricted, as is the general one-dinrensional model 

Examination of the proof of Theorem 1 reveals a further problem with the 2PL and 3PL 
models. In the 2PL case, if a is chosen as in the proof of Theorem 2, 7 is in R q , and p satisfies 
( 6 ), (7), and (9), then open neighborhoods O of a and O' of 7 exist such that, if b is in O and <5 
is in O', then a distribution function G exists such that 


p = 6h — 5, 


and 



Py 


j W exp(y'/it)dO. 


Thus the model parameters aj and 7 j are not estimable in the general 2PL model. Similar 
comments apply to estimation of aj, Cj , and 7 j in the general 3PL model. 


3. Dimension Theory and Local Restriction 

In many cases, for a nonempty subset T of the simplex S, a determination that M[T ) is 
locally restricted is based on a determination of the topological dimension of T. The topological 
dimension dirn(T) of T may be defined in terms of finite open coverings. Here a finite class C of 
nonempty open subsets of R J is a finite open covering of T if each x in T is included in some C in 
C. A finite open cover V of T is a refinement of C if to each C in C corresponds a D in V such 
that D C C. The finite open cover T> has integer order k > 0 if distinct sets Di, 1 < i < k + 1, in 
T> exist such that n^ 1 D t is nonempty and if no distinct sets D *, 1 < i < k + 2, in V exist such 
that nf+fl), is nonempty. The topological dimension dim(T) of T is the smallest integer k > 0 
such that every finite open cover C of T has a refinement V of order k (Hurewicz & Wallman, 
1941, pp. 5, 54, 56). 

As evident from the following known results, the topological dimension as defined here does 
have properties intuitively expected of a definition of dimension. 



Theorem 3 If U is a nonempty subset of T and T is a subset of S, then dim([/) < dim(T) 
(Hurewicz & Wallman, 1941, p■ 26). 

Theorem 4 The simplex S has topological dimension 2 q — 1 (Hurewicz & Wallman, 1941, p. 46)- 

Theorem 5 If T is a nonempty subset of S, then dinr(T) = 2 q — l if, and only if, ir(T) is nonempty 
(Hurewicz & Wallman, 1941, p■ 46)- 

An alternative version of Theorem 5 is that M(T ) is locally unrestricted if, and only if, 
dinr(T) = 2 q — 1, and M(T ) is locally restricted if, and only if, dinr(T) < 2 q — 1. 

Theorem 6 If T is a nonempty subset of S, r >1 is an integer, O is a subset of R r with nonempty 
interior, g is a continuous one-to-one function from O onto T, and g _1 is also continuous, then T 
has dimension r (Hurewicz & Wallman, 1941, p■ 46) 

Under the conditions of Theorem 6 , an r-dinrensional parameter r] in O is uniquely defined 
by the equation p = g(r]). 

Theorem 7 If T is a nonempty subset of S, if r > 1 is an integer, if O is the union of a countable 
number of closed and bounded nonempty convex subsets of R r , and g is a continuously differentiable 
function from O onto T, then dirn(T) < r 

Proof. The conclusion follows from standard results concerning Hausdorff dimension 
(Falconer, 1990, pp. 29-30) given the relationship of Hausdorff dimension to topological dimension 
(Hurewicz & Wallman, 1941, p. 104). 

In Theorem 7, the condition on O holds if O = A n B for nonempty convex subsets A and B 
of R r such that A is closed and B is open (Haberman, 1996, p. 180). In all examples considered 
in this report, the conditions on O are satisfied. 

Theorem 8 Under the conditions of Theorem 7, let g have coordinates g y for y in J, and let the 
gradient of g y be Vg y . Let U be the subset of r] in the interior of O such that the vectors SIg y {r]), 
y in J, span R r . If U is nonempty, then dinr(T) = r. 

Proof. For rj in U, there exists a subset K of J with r elements such that the Vg y (? 7 ) are 
linearly independent for y in K. The inverse-mapping theorem then implies that a nonempty 
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open subset N of O and a nonempty subset Z of T exist such that the restriction of g to N is a 
one-to-one function onto B with a continuous inverse (Loomis & Sternberg, 1968, p. 167). Given 
Theorems 3, 6, and 7, both B and T have dimension r. 

Theorem 8 has significant impact on use of maximum likelihood. Let the conditions of 
Theorem 8 hold, and let T C Sq. Define r], N, and Z as in the proof of Theorem 8, and let 
P = §( J ?)- Consider the nonsingular information matrix 

1 = HiSy Ml -1 V 5yCr7)[ V ffy (*?)]'• 
yeJ 

Let f/ in N be a function of f such that g(r)) = p whenever p is a maximum-likelihood estimate of 
p under the model M(Z), so that r/ is the maximum-likelihood estimate of rj. Then n 1/2 (r) - rj) 
converges in distribution to a multivariate normal random vector with mean 0 and covariance 
matrix I -1 (Birch, 1964). If g is a one-to-one function with a continuous inverse, then N can 
be defined to be the interior of O and r/ can be defined so that g(ff) = p whenever p is a 
maximum-likelihood estimate of p under model M[T). 

In Theorem 8, the condition that X7g y (r]), y in J, spans R r is equivalent to the condition that 
no b in R r exist such that b / 0 and W'Vgy^) = 0 for all y in J. 

The Rasch Model 

Given these preliminaries, it is readily shown that the general Rasch model is locally restricted 
if, and only if, q < 2. Consider the following theorem. 

Theorem 9 The topological dimension dim(S'i) = 2q— 1, so that the general Rasch model is locally 
restricted if, and only if, q > 2. 

Proof. Let u y = ]C'- =1 yj for y in J, so that u y is a nonnegative integer not greater than q. 
Let g be the real function on R 2 ^ 1 such that 

(2q-l \ 

5y(x) = c(x) exp I ^ Xj t jy I , (13) 

h q -i XI - 1 

exp E x jtjy I , (14) 


c(x)= J2 
yeJ 
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and 


1 


tjy ~ ' 


0, 

1 , 

0 , 


1 < Uy = j < q, 

u y / j, 1 < j < q, 

Vj+i-q = 1 , q + 1 < j < 2 q - 1 , 
Vj+i-q + 1, q + 1 < j < 2 q ~ 1- 


(15) 


Let 5i e be the image of g. Then Si C S\ e (Tjur, 1982; Cressie & Holland, 1983; Haberman, 2004). 
The function g from R 2q ~ 1 onto S\ e is continuously differentiable and has a continuous inverse 
g _1 (Haberman, 1973), so that Theorem 6 implies that dinr(Si e ) = 2q — 1. Because S\ C Si e , 
Theorem 3 implies that dim(Si) < 2q — 1. 

To discuss the relationship of Si and Si e requires consideration of two determinants. Let 
Afc(x) be defined for x in R 2q ~ 1 and for 0 < k < q in the following fashion. If k is even, then let 
d = k/2 and let Afc(x) be the d + 1 by d + 1 matrix with row i and column j equal to exp(xi+j_ 2 ) 
for integers i and j from 1 to d + 1. If k is odd, then let d = (q + l)/2, and let A^.(x) be the d by 
d matrix with row i and column j equal to exp(xj-|-j_i) for integers i and j from 1 to d. Let N 
be the set of x in R 2q ~ 1 such that the determinants of A 9 (x) and A g _i(x) are positive. The set 
N is nonempty (Karlin & Studden, 1966, pp. 38, 171). Because a determinant of a matrix is a 
continuous function of the elements of the matrix, A is a nonempty open subset of R 2 ^ 1 . If x 
is in N, then g(x) is in Si (Cressie & Holland, 1983; Lindsay, Clogg, & Grego, 1991; Haberman, 
2004). By Theorems 3 and 6, dim(Si) = 2q — 1. Because 2q — 1 < 2 q — 1 if, and only if, q > 2, 
it follows that from Theorem 5 that the general Rasch model is locally restricted if, and only if, 
q > 2. 


In Theorem 9, local restriction does not eliminate all problems of parameter estimation. 

One has g(rj) = p for a unique r/ in Si if p is in Si, and the inverse of g is continuous; 
however, the parameter r) is not a simple quantity. Given ( 6 ), (7), and ( 8 ), rjj = 7 j + i_ q — 71 for 
q + 1 < j < 2 q — 1 , so that contrasts between the 7 j are readily estimated, but 

, , fV(0) exp(ajd)dF(0) 

exp( ’ , ’ ) = — 7 vmm — 

for 1 < j < q, so that the item discrimination a and ability distribution F are not determined by 
the general Rasch model (Lindsay et al., 1991). 

Given Theorem 9, it follows that the normal Rasch model and the latent-class Rasch model 
are both locally restricted if q > 2. A bit more can be said in terms of parameter estimation given 


11 



dimensional analysis. Consider the following results. 


Theorem 10 In the normal Rasch model, dim(SYn) = q + l, and M(S\ n ) is locally unrestricted 
only if q = 2 . 


Proof. Let O be the subset of R q+1 of vectors x with > 0. For x in O, let 

<2 

h(x, 0) = JJ[1 + exp(x q+ i9x q+ i - xj)} -1 , 

3 =1 


and let 


<? 


,( x ) = Y. 


y jX j. 


3 = 1 


Define u y as in the proof of Theorem 9. Let g be the function from O onto 5'i n with coordinates 
(jy, y in J, such that 


< 7 y (x) = exp[—u y (x)] J h(x,0)exp(x g +iUy0)(j)(6)d6. 


Under the normal Rasch model, ( 6 ), (7), ( 8 ), and F = $ imply that p = g (rj) if rjj = 7 j for 
1 < j < q and 7 ] q+ i = a. 

Given general results for exponential families with incomplete data, it is readily verified that 
g is continuously differentiable (Sundberg, 1974). Let 

dj(x, 9) = [1 + exp(xq+i 9 - Xj)}^ 1 


for 1 < j < q, and let 

g 

d+{x,9) = ^dj(x, 6 »). 

3 =1 

The gradient Vg y (x) of g y at x has elements gj y fx), 1 < j < q + 1, such that g qy (x) is 


for 1 < j < q and 


— exp[—u y (x)] / [yj — dj(x , 9)]h(x, 9) exp(x g+ i u y 9)cj)(9)d9 


exp[—u y (x)] / [u y — d. |_(x, 9)]h(x, 9) exp(x g+ i u y 9)4>(Q)d9 


for j = q + l. By Theorem 7, dim(S'in) < q + 1. 

Let b in R q+l satisfy 

b'V 9y (x) = 0 


(16) 
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for all y in J. Let 



and 

J d+(r,d)h(r,d)<l>(d)d0 > 0 . 

It follows that b = 0 if 

J 0 d+(x, 0)h(x, 6)<f>(6)d6 < 0 (18) 

and 

J 6 d + { r, 0)h{ r, 6)4>{6)d9 < 0. (19) 

The inequalities (18) and (19) hold if x has coordinates Xj = 0 for 1 < j < q, for x and r are the 
same, and 

/i(x, 9)d- |_(x, 6 ) — /i(x, —6 ) )d + (x, — 6 ) 
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is equal to 

/i(x,0)d+(x,0)[ 1 - exp ((<7 - l)x q+ i9)] < 0 

for 9 > 0. Thus 

f 0d + (x, 0)/i(x, 9)cj)(6)d6 

0[exp(x g+ i0) — exp(gx g+ i0)](i_|_(x, 0)/i(x, 9)<f(9)d9 

< 0. 

It follows that x in O can be selected so that b must be 0 if (16) holds for all y in J. By 
Theorem 8, dim(5 , ] n ) = q + 1, so that Theorem 5 implies that M(S \ n ) is then locally unrestricted 
if, and only if, q + 1 = 2 q — 1. Because q + 1 = 2 q — 1 only holds if q = 2, M(S\ n ) is only locally 
unrestricted if q = 2. 

Given Theorem 9, the latent-class Rasch model is obviously locally restricted if q > 2, for 
Six C Si. The following result shows that the latent-class Rasch model is locally unrestricted if 
q < 3. 

Theorem 11 If q < 3, then S\ T = S\, so that the latent-class Rasch model M(S \ T ) is locally 
unrestricted for q = 2. 



Proof. If p is in Si, then there exist a > 0 , 7 in R g , pi < p2, and nonnegative ni and TT2 such 
that 7Ti + 7T2 = 1, (6), (7), and (8) hold for F the distribution function of a random variable that 
assigns probability iik to pk for k equal 1 or 2 (Karlin & Stridden, 1966; Lindsay et al., 1991). Let 
G be the distribution function of a random variable that assigns probability ilk to r*, for k equal 1 
or 2. Let 

b = a{p 2 - Pi)/(t 2 - r 2 ), 

and let <5 in R q satisfy 

r 2 pi - np 2 

0 = 7 + a - 1 . 

T\ - T 2 

Let 

H(0) =9h-S 


for 9 real, and let 


g 

w = n^+^r 1 - 


3 = 1 
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Then 


Py 


j IFexp(y»ciG, 


so that p is in S\ T . It follows that S\ T and S\ are the same, so that the conclusions of the theorem 
follows from Theorem 9. 


In general, an argument similar to that in Theorem 10 may be used to demonstrate that 
dim (S'it) < q + K. Consider the following theorem. 


Theorem 12 In the latent-class Rasch model, dim(S'i x ) < q + K. 


Proof. Let O be the set of x in R q+K such that x q+ \ > 0, x 3 > 0 for j > q + 1 and 
J 2 < j= I q+ 2 x j — 1- Let h be the real function such that 

g 

L(x, 0) = JJ[1 + ex.p(x q+1 9 - Xj)]" 1 

3 = 1 


for x in O and real 9. Define u y as in the proof of Theorem 9. Let g be the function from O onto 
S it such that the coordinate g y , y in J, satisfies 


0y( x ) 


K—l 

Y X q+ l + fc/l(x, Tfc) exp(x q+1 UyTk) 
k= 1 


+ 


1 - 


K -1 

E 

k=1 


%q-\-l+k 


L(x, tk) exp(x q+ iu y T K ). 


Note that p = g(rj) if (6), (7), and (8) hold, i n k = 1, F is the distribution function for a 
random variable that equals Tk with probability ilk for 1 < k < K, rjj = 7 j for 1 < j < q, r) q+ \ = a, 
and r] q+ k+i = ^k f° r 1 < k < K — 1. It is easily verified that g is continuously differentiable. By 
Theorem 7, dim( 1 S'i T ) does not exceed q + K. 


The upper bound in Theorem 12 is not necessarily achieved. For example, Theorem 11 
implies that dim(S'iT) = 3 < q + K if q = 2. On the other hand, Theorem 11 also implies that the 
upper bound of q + K = 5 is achieved if q = 3 and K = 2. In general, because dim(Si) = 2^ — 1, 
dim(S'iT) cannot exceed 2 q — 1. Thus standard conditions for estimation of the parameters 7 j, 

1 E j < q, a, and 17 , 1 < k < K — 1 cannot hold if K > q — 1. 
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Restricted 2PL Models 

Restricted 2PL models may be locally restricted. The required arguments are quite similar 
to those used for restricted Rasch models. In the case of the normal 2PL model, the relationship 
Si n C S 2 n and Theorem 10 clearly implies that dim(Sbn) = 3 and M(S 2 n ) is locally unrestricted if 
q = 2 . For q > 2 , dim(Sbn) = 2 q + 1 and M(Sb n ) is locally restricted. 

Given Theorem 11 and the relationship SW C Sb-r, it is clearly true that the latent-class 
2PL model is locally unrestricted for q = 2. For q < 3, 2q + 1 < dim(Sb x ) < 2q + K — 1, so that 
Theorem 5 implies that the model is locally restricted if 2 q + K < 2 q and locally unrestricted if 
q = 3. The case of q = 3 can also be considered by use of equations used in latent-class analysis 
that involve determinants and eigenvalues (Madansky, 1960). This argument relies on continuity 
properties of eigenvalues and eigenvectors (Wilkinson, 1965, chaps. 1-2). 

In principle, the latent-class 2PL model is locally restricted as long as K <2 q — 2q; however, 
as evident in Section 4, it should not be concluded that use of a large value of K is wise. 

Restricted 3PL Models 

Restricted 3PL models also may be locally restricted. Arguments are again similar to those 
used in Theorems 10 and 12. In the normal 3PL model, the relationship Sbn C S 3n C S and 
Theorem 3 imply that dim(S l 3 n ) = 2 q — 1 and M(S 3n ) is locally unrestricted for q < 3. For q > 3, 
M(S 3n ) is locally restricted and dim(S , 3 n ) = 3 q. 

In the case of latent-class 3PL models, the relationship S 2 x C S 3t implies that M(S 3t ) is 
locally unrestricted if q < 3. For q > 3, dim(,Sb x ) < 3q + K — 1. If K = 2, then S 3t = Sb x , so that 
dim(S , 3 X ) = dim(5b x ) = 2q + 1 for q > 3, and M(S 3t ) is locally restricted for q > 3. To verify the 
identity of and S 3t , consider the following theorem. 

Theorem 13 If K = 2, then 5b x = S 3t . 

Proof. Because Sb-r C Sb-r, it suffices to show that S 3t is in 5b x . If p is in S 3 t, then ( 1 ), (5), 
and (10) hold for some cij > 0, real 7 j, and Cj in [0,1) and some distribution function F such that, 
for some nonnegative and 7 T 2 with sum 7 Ti + 7 T 2 = 1, F is the distribution function of a random 
variable that assigns probability 717 to 77 for 1 < k < K = 2. Define b and 7 in R q by solution of 
the simultaneous equations 

bjT k - Sj = log[Pj(T k )/Qj(T k )\, 1 < k < 2, 
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for 1 < j < q. Let fi(0) = 8b — S. Then 


Pj(T k ) = {1 + exp [—/Xj(Tfe)]} 1 

and 

Qj(r k ) = {1 + exp[^(T fc )]} _1 

for 1 < j < q, so that p is in S^-r- 

4. Parameter Estimation for Latent-Class Models for Item Responses 

Even though simple conditions are available to ensure that latent-class 2PL and 3PL 
models are locally restricted, these conditions do not imply that parameter estimation is readily 
accomplished. Some problems reflect fundamental problems of parameter identification that occur 
when the upper bound for the topological dimension is not achieved. For example, in a latent-class 
3PL model with K = 2, it is not possible to identify Oj, Cj, and 7 j, as is evident from Theorem 13. 

More complex problems involve cases in which the conditions of Theorem 7 are satisfied for 
S- 2 t or S^t for some function g but the information matrix I is nearly singular. This situation has 
adverse effects on numerical algorithms for computation of maximum-likelihood estimates and 
adverse effects on the accuracy of parameter estimates. Such difficulties have been noted previously 
(Heinen, 1996), and they are relevant both with the EM algorithm (Dempster, Laird, & Rubin, 
1977) and with the stabilized Newton-Raphson algorithm (Haberman, 1988). In particular, the 
problems of parameter estimation for latent-class 2PL and 3PL models has practical consequences 
for item calibration in the National Assessment of Educational Progress (NAEP) based on the 
Parscale/NAEP program option to employ a latent-class 3PL model with 41 latent classes. 

The importance of the problem is examined in this section by use of the latent-class 2PL 
model. It should be emphasized that problems will be substantially more severe in the 3PL case. 
For analysis in this section, the parametrization uses the subset O of R r , r = 2q + K — 1, such 
that x is in O if xj > 0 for q + 1 < j < 2 q, Xj > 0 for j > 2 q, and J 2 j= 2 q+i x j — 1 - For x in 0,8 
real, and y in J, let 

1 

/i(x, 8) = + exp (xg +j 8 - Xj)] -1 , 

3 =1 

<? 

v y( x ) = 

3 =1 
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and 


<? 

^y( x ) = ^2 X q+jVj^ 

3 =1 


c y (x, 9) = exp[—u y (x)]/i(x, 0) exp[0t y (x)]. 

Let the function g from O onto S 2 T be defined so that, for y in J, g(x) has coordinate 

K 

5y( x ) = ^X 2 g+fcCy(x,T fc ) 
fc=i 

for y in J, where x 2 q +K denotes 1 — Yh'j= 2 q+i x j• f° r some real a,j > 0 and real 77 , 1 < j < q, 
(6), (7), and (9) hold for the distribution function F of a random variable that is equal to 77 with 
probability 717 > 0, 1 < k < K, where Ylk=i n k = 1, then p = g(ry) for rj in O such that Xj = 7 ^ 
for 1 < j < q, x q+ j = a,j for 1 < j < q, and X 2 q +j = for 1 < j < K — 1. 

Let 

dj(x, 6 ) = [1 + exp(-x g _| -j9 + Xj )] _1 


for 1 < j < q. 


Then the partial derivatives of g y are the continuous functions gj y with values 


gj y( x ) of 


for 1 < j < q, 


for q + 1 < j < 2 q, and 


K 

~ X 2 q+k[Vj ~ dj{x, Tfc)]c y (x, T k ) 
k= 1 


K 

Y, X 2q+ kTk[yj - dj (x, Tfc)]c y (x, T k ) 
k= 1 


C y( X ) Tfc) - Cy(x, Tk) 

for 2g + 1 < j < r. Thus all conditions in Theorem 7 are satisfied. In Theorem 8, it is often the 
case that U is indeed nonempty and rj is in U, but the matrix I is so close to singular that rj 
cannot be accurately estimated by use of a reasonable sample size. 

The problem of near singularity of I arises primarily from the behavior of the partial 
derivatives gj y for j > 2 q that are associated with the probabilities 717 , 1 < k < K — 1. Recall 
from elementary linear algebra that the smallest eigenvalue of I is the minimum of b'lb for b 
in R r such that b 7 b = 1 . Let z y denote the A'-dimensional vector with coordinates Cy(77,77) for 
1 < k < K. Then this minimum is certainly no greater than the minimum of 

*( b ) = ypy i ( h 'y 2 

yeJ 
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for b in R K such that Ylk=\ = 1 an d Ylk =1 = 0- 

For fixed rj, Cy(rj, r) is infinitely differentiable in r. Let v be the average of the v y for y in J, 
and let f be the average of the Tk for 1 < k < K. For each integer r > 1, Taylor’s theorem can be 
used to show that z y has a polynomial approximation u y with coordinate k equal to 

r u 

exp(t y ) w ™( T k ~ f) u (v y - v) s . 

u= 0s=0 

If (r + l)/2 < K, then the bf. can be selected so that bV y is 0 for each y in J. Thus 

*( b ) = 

yeJ 

As K increases and the t\. are all in a fixed finite interval, the argument implies that the minimum 
of ’L(b) should approach 0 rather rapidly. Thus it is reasonable to expect that I will be nearly 
singular even for K somewhat less than 2 q — 2 q. 

This argument suggests that the problem of near singularity is less severe for a given number 
of latent categories K if q is larger, for the variability of vy is then increased. On the other hand, 
the argument is also relevant in the Rasch model case of a» constant, so that the use of a model 
with a finite number of values of 9 t does not necessarily lead to well-identified parameters even in 
the Rasch model unless K is relatively small. 

For some understanding of the issue, consider a case in which q = 11; the item difficulties are 
/ 3j = —3 + 3(j — l)/5 and the item discriminations are a 3 = 0.5 + (j — 1)/10 for 1 < j < q and 
the examinee ability values and respective probabilities are Tk = k — 3 and irk = 0.2 for 1 < k < 5. 
Recall that y j = aj/3j for 1 < j < q. In this case, the largest element of IT 1 is 341.5, so that each 
parameter has an asymptotic standard deviation less than 0.1 if the sample size is about 34,000. 
If K is increased to 10, the irk are each 0.1, and Tk = —1.8 + 0.4(fc — 1) for 1 < k < 10, then the 
largest element of the inverse of the information matrix of Y; is 8,302,862.9, so that comparable 
accuracy of parameter estimates requires about 830,000,000 observations. Thus in the second case, 
it is unreasonable to expect satisfactory performance from maximum likelihood. In the former 
case, some hope exists. The most extreme problems involve diagonal elements of the information 
matrix that correspond to the latent probabilities n however, it should be emphasized that other 
parameter estimates are also affected. The smallest value of a diagonal element of I -1 is only 0.09 
for the case of five latent classes, while the corresponding figure for 10 latent classes is 72.3. 
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The problems considered here are not solved if the a,j are all constant, as in the Rasch model. 
For the case just examined with K = 10, change all a.j to 1. Then the maximum value of an 
element of IT 1 is 1,734,948.5. 

Even if the Rasch model is assumed to hold with each a.j = 1, problems persist. Define I 
based on the function g in the proof of Theorem 12. The inverse information matrix for this case 
has a largest diagonal element of size 1,734,210.6. The difficulties encountered do disappear if the 
latent probabilities 717 are known. For the original 2PL case of K = 10, the maximum element of 
the inverse information matrix for the remaining parameters 7 j and cij is only 77.9. Comparable 
results are achieved for the normal 2PL model. In this case, the maximum element of the inverse 
information matrix is 22 . 1 . 

It is somewhat difficult to characterize precisely when near singularity occurs given that q, (3j, 
aj , K, 7 Tfc, and 77 all have impact; however, it certainly should not be assumed that use of the 2PL 
model with latent classes will be satisfactory without regard to the choice of K and 77 . In typical 
situations, it should also be understood that results for the 3PL model are likely to be even worse. 

5. Conclusions 

Unlike in the case of the Rasch model, the 2PL and 3PL models do not provide a simple 
approach for parameter estimation in which a parametric model for the ability distribution is not 
assumed. In addition, simple approaches based on latent classes can be very unsatisfactory unless 
the number of latent classes is rather small. Thus attempts to use 2PL and 3PL models with more 
general ability distributions than the standard normal distribution require quite careful work, and 
steps must be taken to verify that parameters are determined with reasonable accuracy. 
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